Quantile hurdle modeling systems and methods for sparse time series prediction applications

ABSTRACT

A server computer may receive and process a plurality of time series data to generate sparse datasets based on sparsity levels. The server computer applies a time series forecasting model to each respective subset of previous data points of the sparse datasets increasingly at the first time granularity to generate a set of prediction values and a set of residuals; applies a regression model to the set of the prediction residuals to generate a set of adjusted residuals for the sparse datasets; and generates a visualized explanation based on the set of the prediction values and the set of adjusted residuals for one or more of the sparse datasets.

BACKGROUND

Time series data represents historic sequenced data over a range oftime. Time series data may capture trends and patterns relating toevents in different technology fields such as cloud usage, naturalphenomenon prediction, service management, user activities, salesanalysis, transaction management, etc. Predicting or forecasting timeseries may be performed by providing historical time series data to apredictive modeling system. The predictive modeling system can forecastthose time series data into the future and generate time-seriesprediction results. The prediction results provide insights intoactivities or events that may occur in the future. The forecast resultsmay provide valuable information to guide related users and entities toplan their future activities. There are technical challenges to forecastsparse time series that arise when making predictions based on eventsthat occurred sporadically, as these events may not have repetitivepatterns.

BRIEF DESCRIPTION OF THE FIGURES

The foregoing and other aspects of embodiments are described in furtherdetail with reference to the accompanying drawings, in which the sameelements in different figures are referred to by common referencenumerals. The embodiments are illustrated by way of example and shouldnot be construed to limit the present disclosure.

FIG. 1 illustrates an example computing system for generating timeseries prediction in accordance with some embodiments disclosed herein.

FIGS. 2A-2C are schematic diagrams of an example time series predictionsystem in accordance with some embodiments disclosed herein.

FIG. 3 illustrates an example process for generating time seriesdatasets in accordance with some embodiments disclosed herein.

FIG. 4 is a flowchart illustrating an example process of performing timeseries prediction with a Hurdle Regressor for moderately sparse timeseries datasets in accordance with some embodiments disclosed herein.

FIG. 5 show example plots generated based on outputs of a time seriesprediction model for processing moderately sparse time series datasetsin accordance with some embodiments disclosed herein.

FIG. 6 shows scaled residuals corresponding to prediction intervals formoderately sparse time series datasets in accordance with someembodiments disclosed herein.

FIG. 7 shows example plots generated based on outputs of the Quantileregressor for processing moderately sparse time series datasets inaccordance with some embodiments disclosed herein.

FIG. 8 is a flowchart illustrating an example process for processingextremely sparse time series datasets to build an explainable quantileHurdle modeling system to forecast using extremely sparse time series inaccordance with some embodiments disclosed herein.

FIG. 9 shows example plots generated based on outputs of a time seriesprediction model for processing extremely sparse time series datasets inaccordance with some embodiments disclosed herein.

FIG. 10 shows example plots generated based on outputs of a quantileregressor for processing extremely sparse time series datasets inaccordance with some embodiments disclosed herein.

FIG. 11 shows an example interface presenting prediction explanations ofthe forecast result in accordance with some embodiments disclosedherein.

FIG. 12 is a block diagram of an example computing device in accordancewith some embodiments disclosed herein.

DETAILED DESCRIPTION

Embodiments of the present disclosure provide forecasting techniques foraccurately predicting sparse time series data relating to events invarious technology fields.

Time series data may be categorized into different groups such asmoderately sparse time series data and extremely sparse time series databased on sparsity levels representing extent of changes occurring intime series data over a range of time. The existing time seriesprediction models may predict sparse time series data with accuratemean/median values. However, there is a need to provide a modelingsystem to provide a prediction accuracy regarding prediction intervalsand confidence bounds for the sparse time series data.

The existing time series prediction models may predict correct meanvalues for sparse time series data. However, the existing time seriesprediction models may generate a wide range of prediction intervals witha wide upper confidence bound and a wider upper confidence on both sidesof the predicted mean values, which are caused by assuming Gaussiannoise in the sparse time series data during the time series predictionprocess. The resulting prediction results, with a wide range ofprediction intervals and wide confidence bounds, may lead to uncertainestimates of the prediction intervals and may not provide very usefulprediction information for sparse time series data, thereby failing topredict related events that occur in the future. Further, the outputs ofthe existing time series prediction modes do not show that the sparsetime series have Gaussian distribution features due to the non-Gaussiandistribution of the prediction residuals associated with the predictionvalues and prediction intervals generated by the existing time seriesprediction models.

The present invention may provide a practical solution to problemsdescribed above with quantile hurdle modeling systems to generateaccurate and useful prediction information for sparse time series data.

In one or more embodiments, a quantile hurdle modeling system mayinclude a time series prediction model and a quantile regression modelto perform prediction for moderate sparse time series data. The timeseries prediction model may generate the prediction values andprediction intervals for moderate sparse time series data. The quantileregression model may perform auto-regression to estimate differentquantiles of the prediction residuals to generate adjusted predictionresiduals with much tight prediction intervals and confidence boundswhich are more relevant to time series data. A sparse time seriesprediction system may utilize the adjusted prediction residuals and theprediction values to generate accurate prediction explanation to bepresented to users associated with the sparse time series data.

In one or more embodiments, for extremely sparse time series data, aquantile hurdle modeling system may include a Hurdle classifier, a timeseries prediction model and a quantile regression model consideringperiod probabilities associated with extremely sparse time series datato generate accurate prediction result and improve prediction accuracy.The time series prediction model may generate the prediction values andprediction intervals for extremely sparse time series data. The quantileregression model may perform auto-regression to estimate differentquantiles of the prediction residuals to generate adjusted predictionresiduals with tight prediction intervals and confidence bounds whichare more relevant to time series data. A Hurdle classifier may generateperiod probabilities of each sub-period extremely sparse time seriesdata. A quantile hurdle modeling system may evaluate the generatedprediction values and adjusted prediction residuals with the periodprobabilities of each sub-period of extremely sparse time series data togenerate accurate prediction explanations and improve predictionaccuracy.

FIG. 1 illustrates an example computing system 100 for generating timeseries prediction in accordance with some embodiments disclosed herein.The example computing system 100 includes a server computing device or aserver computer 120 and a plurality of user computing devices 130 thatmay be communicatively connected to one another in a cloud-based orhosted environment by a network 110. Server computer 120 may include aprocessor 121, memory 122 and communication interface for enablingcommunication over network 110. Server computer 120 hosts one or moreonline software financial services or software products, which may beexamples of one or more applications 123 stored in memory 122. The oneor more applications 123 (e.g., online services and or applications) areexecuted by processor 121 for providing various online services orproviding one or more websites with services for users to manage theironline activities/events related to time series data changes within atime range. For example, the one or more applications 123 maycontinuously receive and update time series data from various servicesor institutions via the network 110. Memory 122 may store a sparse timeseries prediction system or application including data processing model124, Quantile Hurdle modeling system 125 and other program models, whichare implemented in the context of computer-executable instructionsexecuted by the processor 121 of server computer 120 for implementingmethods, processes, systems and embodiments described in the presentdisclosure. Generally, computer-executable instructions include softwareprograms, objects, models, components, data structures, and the likethat perform functions or implement specific data types. Thecomputer-executable instructions may be stored in a memory 122communicatively coupled to a processor 121 and executed by the processor121 to perform one or more methods described herein. Network 110 mayinclude the Internet and/or other public or private networks orcombinations thereof.

A user computing device 130 may include a processor 131, memory 132, andan application browser 133. For example, a user device 130 may be asmartphone, personal computer, tablet, laptop computer, mobile device,or other device. Users may be registered customers or entities of theone or more online applications 123. Each user may create a user accountwith user information for subscribing and accessing an online softwareproduct or service provided by server computer 120. Each user account isstored as a user dataset associated with time series data or datasetsdescribed below.

Database 126 of the example system 100 may be included in servercomputer 120, or coupled to and in communication with the processor 121of the server computer 120 via the network 110. Database 126 may be ashared remote database, a cloud database, or an on-site centraldatabase. Database 126 may receive instructions or data from, and senddata to, server computer 120. In some embodiments, server computer 120may retrieve and aggregate a large amount of time series data such asstream data, transaction data, text, image, video, etc., by accessingother servers or databases from various data sources 140 via network110. Database 126 may store the aggregated time series data at a dailygranularity, a weekly granularity, etc. The historical time series maybe represented by time series datasets within a time span in acorresponding time step. Database 126 may store and update historicaltime series datasets 127 associated with events and correspondingusers/entities via the network 110. Database 126 may store the timeseries datasets for building a Quantile hurdle modeling system 125 toforecast extremely sparse time series to generate prediction orpredicted data 128 associated with events that may occur in the future.Database 126 may store prediction results as predicted data 128 togenerate textual and or graphical reports for associated entities.Details related to building the Quantile Hurdle modeling system 125 willbe described below.

FIGS. 2A-2C are schematic diagrams of an example time series predictionsystem 200 in accordance with the disclosed principles. System 200 maybe implemented as computer programs executed by the processor 121 of theserver computer 120 for implementing various functionalities of models,modeling systems, algorithms, processes, and embodiments variousprocesses and embodiments described herein. System 200 may exploremodeling techniques (e.g., machine learning algorithms or models)compatible with sparsity levels of time series datasets and generate thepredictions for the time series datasets. System 200 may include a dataprocessing model 124 and an explainable Quantile Hurdle modeling system125.

Time series data may represent event data that includes, but is notlimited to, cloud usage (e.g., storage usage or cost analysis), digitalsignal processing, audio/video processing, natural phenomenon data(e.g., weather information), entity activities or behaviors, nationaleconomy, market forecasting, financial service management data (e.g.,transactions, payment, or sales), and any other time step data, etc.

A process of feature engineering may be performed by the server computer120 to apply to historical time series data to extract and constructhistorical time series datasets 127 associated with time series events.Each time series dataset may be associated with a time series identifierand include a set of data points indicating values at respective timesteps. The time step granularity of time series values may berepresented by temporal features depending on the granularity of theprediction.

In the absence of any data points between previous adjacent time steps,the time series may be imputed with zeros to satisfy a constantgranularity gap amongst data points.

System 200 may include a data processing model 124 to process and grouptime serial datasets 127 into two groups of moderately sparse timeseries datasets 202 and extremely sparse time series datasets 204. Anappropriate time series predicting model may be selected to beparticularly suited to the type of time series, such as a sparsity levelof time series datasets. The grouped time serial datasets 127 may beused as time series training data and time series test data to train thecorresponding models and molding system.

A given time series may consist of systematic components including theaverage value in the series, trend indicative of the increasing ordecreasing value in the series, seasonality indictive of the repeatingshort-term cycle in the series, and one non-systematic random variationor noise in the series. A given time series dataset may include a valueand a set of temporal features. A set of temporal features may includedate, day, week, month, working day, day of the week, week of the month,quarter, month start, etc.

For moderate sparse time series datasets, Hurdle Regressor 208 may beused to generate time series prediction. As illustrated in FIGS. 2A-2C,Hurdle regressor 208 may include a time series prediction model 2081followed by a quantile regressor 2082 (e.g., quantile regression model)model 2082 to forecast target prediction values for the moderate sparsetime series datasets 202. The time series prediction model 2081 may be aBayesian linear regression model such as a Structural Bayesian TimeSeries (SBTS) model.

The time series prediction model 2081 may predict and generate aprediction value or a mean/median value {circumflex over (x)}_(t) at atime step is (e.g., weekly, or monthly, etc.) based on a subset of timeseries datasets or a subset of previous data points with actual values(x_(t-n), . . . x_(t-1)) before the time step t_(s) corresponding toprevious events. The time series prediction model 2081 may furthergenerate a prediction interval and prediction residual(x_(t)-{circumflex over (x)}t) at the time step is based on the subsetof time series datasets. A prediction interval may represent a range oflikely prediction values of an output variable from the time seriesprediction model 2081 at the time step t_(s). A residual(x_(t)-{circumflex over (x)}t) may be calculated and determined as adifference between the actual time series value x_(t) and the predictionvalue {circumflex over (x)} at the time step t_(s).

Quantile regressor 2082 may be a quantile regression based machinelearning model. As illustrated in FIGS. 2B-2C, Quantile regressor 2082may perform quantile-based confidence bound computation based on theresiduals (x_(t)-{circumflex over (x)}_(t)) generated by the time seriesprediction model 2081 and a set of model parameters b_(i). Quantileregressor 2082 may estimate the confidence bound to generate accurateprediction intervals.

Quantile regressor 2082 may be represented as an equation (1):

Q(x _(t)-{circumflex over (x)} _(t))=g(t, t″, x _(t-1) , {circumflexover (x)} _(t) , {circumflex over (x)} _(t-1) . . . {circumflex over(x)} _(t-n))  (1)

Quantile regressor 2082 may be trained with prediction values{circumflex over (x)}_(t), the residuals (x_(t)-{circumflex over(x)}_(t)) from the time series prediction model 2081, and a set ofparameters b_(i) to estimate the probability distribution of theresiduals (x_(t)-{circumflex over (x)}_(t)) around the mean/median value{circumflex over (x)}_(t). In some embodiments, the set of modelparameter b_(i) may be generated by fitting the residuals(x_(t)-{circumflex over (x)}_(t)) into the quantile regressor model2082. For example, the set of parameters b_(i) may comprise a set ofquantile values and a plurality of temporal features (t, t″) including adate, day, week, month, day of the week, week of the month, etc. Forexample, Quantile regressor 2082 may be trained to estimate and predictthe time series residuals at different quantiles such as 10%, 50%, 90%,etc. In some embodiments, the quantile regression may provide anon-parametric way of estimating probabilistic prediction by utilizingquantile loss to directly model the quantile level.

The Quantile regressor 2082 may perform auto-regression to estimatedifferent quantiles of the residual distribution for Non-Gaussian noiseto generate accurate prediction 210 for the moderate sparse time seriesdatasets 202.

Referring to in FIG. 2A-2C, for the extremely sparse time seriesdatasets 204, the explainable Quantile Hurdle modeling system 125 mayinclude a Hurdle classifier 206, a Hurdle regressor 208 and aprobability filter 212 to generate time series prediction 214.

The time series prediction model 2081 may predict a prediction value{circumflex over (x)}_(t) at a time step is (e.g., weekly, or monthly,etc.) based on a subset of extremely sparse time series datasets 204 ora subset of previous data points with actual values (x_(t-n), x_(t-1))before the time step is corresponding to previous events. The timeseries prediction model 2081 may further generate a prediction intervaland residual (x_(t)-{circumflex over (x)}_(t)) at the time step t_(s)based on the subset of time series datasets or data points. Quantileregressor 2082 may perform Quantile-based confidence bound computationbased on the residuals (x_(t)-{circumflex over (x)}_(t)) generated bythe time series prediction model and a set of parameters c_(i) toestimate the confidence bound to generate prediction intervals. The setof model parameters c_(i) may be tuned to the extremely sparse timeseries datasets with different values for trend and/or seasonalitycomponents as opposed to the set of model parameters b_(i) for themoderate sparse time series.

Hurdle regressor 208 may be implemented by respective algorithms ofvarious machine learning models suitable for extremely sparse timeseries datasets 204.

Hurdle classifier 206 may predict whether an event relating to the timeseries is likely to occur by generating a probability of the event at atime step granularity (e.g., daily, weekly, etc.). For the extremelysparse time series datasets 204, Hurdle classifier 206 may be trained topredict the probability of the event for each data point the time stepis daily. For example, Hurdle classifier 206 may generate a set ofprobabilities (p1, p2 . . . p7) for a sub-period of time series datasetswithin a week.

For a sub-period of time series datasets, the system 200 may use aprobability filter 212 to determine a period probability p(w) of eventsoccurred in a week based on an equation (2):

p(w)=1-(1-p1)*(1-p2)* . . . *(1-p7)  (2)

The probability filter 212 may function as a binary filter by comparingthe period probability p(w) to a probability threshold. The output ofHurdle Regressor 208 may be determined to be the prediction value forthe set of corresponding datasets within in the week when theprobability p(w) is above the probability threshold. Details aboutprocesses related to system 200 will be described below.

FIG. 3 illustrates an example process that may be executed to generatetime series datasets for training models and modeling system of thesystem 200 to generate time series prediction in accordance with someembodiments of the present disclosure.

At 302, the processor 121 may receive historical time series datasets127 from the database 126. For example, each time series dataset mayrepresent time series digital values or numbers corresponding to a setof features associated with related events. Each time series dataset maybe graphically presented as a set of data points indicative values atrespective time steps over a time window or a time frame given the timeseries. The time series datasets may have varying sparsity levels. Thesparsity level may be related to percentage of nonzero values,periodicity metrics, number of peaks, and length of time series.

At 304, the processor 121 may identify and determine a sparsity level ofeach time series dataset for determining whether each time seriesdataset is a sparse time series. The processor 121 may process each timeseries dataset to determine whether a time series data has the sparsitylevel beyond or below a sparsity threshold. For example, the sparsitylevels for the time series datasets may be determined by calculating apercentage or a ratio of nonzero values of the time series dataset aseach respective sparsity level of each time series dataset over a timeperiod.

At 306, based on the sparsity levels of the time series datasets, theprocessor 121 may determine and group the time series datasets as twogroups including moderately sparse time series datasets (e.g., a firstset of time series datasets) and extremely sparse time series datasets(e.g., a second set of time series datasets). If the sparsity levels oftime series datasets are determined lower than the sparsity threshold,the sparse time series datasets may be grouped as extremely sparse timeseries datasets. If the sparsity levels of time series datasets aredetermined above or equal to the sparsity threshold, the sparse timeseries datasets may be grouped as moderate sparse time series datasets.In some embodiments, the sparsity thresholds may vary across differentranges for time series data. The sparsity thresholds may depend onqualities of the time series, such as periodicity, non-stationarity,etc. In one or more embodiments, a time series dataset may becategorized or grouped as the extremely sparse time series dataset if aratio of the number of data points to the length of the time series isless or equal to 0.1. A time series dataset may be categorized orgrouped as the moderate sparsity time series dataset if the ratio of thenumber of data points to the length of the time series is more than 0.1and less than or equal to 0.5. A time series dataset may be grouped as anon-sparse time series if the ratio of the number of data points to thelength of the time series is more than 0.5. These values are presentedas examples that can be used in some embodiments, although it may bepossible to use different values to categorize sparsity in otherembodiments.

For the moderately sparse time series datasets, the time seriesforecasting models 2081 may be able to predict the mean/median valuesaccurately. However, the prediction may result in uncertainty with wideand inflated confidence bounds which may lead to uncertainty estimatesof the prediction intervals and may not provide very useful predictioninformation for sparse time series data. The quantile regressor may betrained based on the residuals from the time series prediction modelswith a set of parameters b_(i) to generate an accurate quantile-basedconfidence bound.

FIG. 4 is a flowchart illustrating an example method and process 400 ofgenerating time series prediction with a Hurdle Regressor for moderatelysparse time series datasets in accordance with some embodimentsdisclosed herein. FIG. 5 show example plots generated based on outputsof the time series prediction model 2081 for processing moderatelysparse time series datasets or a first set of datasets 202.

At 402, server computer 120 may receive a first set of time seriesdatasets 202 from database 126. The processor 121 may perform operationsto generate different groups of time series datasets including actualpast training datasets 51 and actual future test datasets 53. The firstset of time series datasets 202 may each correspond to a data pointindicative of a data value. Each data point or dataset may correspond toeach respective subset of previous data points at a first timegranularity within a time window. For example, a first time granularitymay be a daily or weekly granularity. Referring to FIG. 2C, a timewindow may be multiple granularity time periods corresponding to eachsubset of previous datasets at time steps (t-1, . . . , t-n) before atime step t where the data value {circumflex over (x)}_(t) may bepredicted.

At 404, the processor 121 may train a time series prediction model 2081with actual past training datasets 51 and actual future test datasets53. In some embodiments, a Bayesian linear model be trained with a setof coefficients a_(i) to estimate the prediction values or mean/medianvalues {circumflex over (x)}_(t) for the respective time seriesdatasets.

At 406, the time series prediction model 2081 may be executed by theprocessor 121 to apply to each respective subset of previous data pointsto generate a first set of predicted mean/median values (e.g., predictedpast mean data 52) and a first set of time series residuals. The trainedtime series prediction model 2081 may generate accurate mean values aspredicted past mean data or datasets 52. As illustrated in FIG. 5, thetime series prediction model 2081 may also generate predicted confidencebounds (training) 55 with a blue shared area, prediction intervals 57and corresponding time series residuals (x_(t)-{circumflex over(x)}_(t)) for each data point of the actual past (training) dataset 51.For each data point of the actual future (test) dataset 53, the timeseries prediction model 2081 may generate predicted mean values 54(shown as predicted future mean data in FIG. 5), predicted confidencebounds (test) 56, prediction intervals 58 and corresponding time seriesresiduals.

A prediction interval 57 may represent a range of likely predictionvalues of an output variable from the time series prediction model 2081.A prediction interval 57 may be a range of values between a maximumupper confidence bound and a minimum lower confidence bound for eachcorresponding time serials dataset or data point. The existing modelsnormally assume that the time series residuals are subject to Gaussiannoise signal. As illustrated in FIG. 5, the time series prediction modelmay generate the prediction 212 resulting in very wide predictionintervals 57 with a blue shaded area on both sides of the data points ofthe predicted mean values 52. The wide prediction interval 57 mayrepresent uncertainty about the prediction values of time seriesdata-points with uncertainty bounds since the corresponding residualsmay not have a normal distribution or Gaussian distribution.

FIG. 6 shows scaled residuals corresponding to the prediction intervals57 for moderately sparse time series datasets 202. The scaled residualsare characterized as a non-normally distributed distribution and not aGaussian distribution. The time series prediction models may be aStructural Bayesian Time Series (SBTS) model and many otherstate-of-the-art models. The existing time series prediction models mayassume a Gaussian noise distribution of the residuals, which leads toinaccurate uncertainty estimates of the prediction intervals.

At 408, the processor 121 may train a quantile regressor 2082 to performquantile-based confidence bound computation based on the time seriesresiduals generated by the time series prediction model 2081 along witha first set of model parameters b_(i) to predict the confidence boundsfor improving the prediction interval estimation.

At 410, the processor 121 may apply the quantile regressor 2082 to thefirst set of residuals generated by the time series prediction model2081 to generate a first set of adjusted residuals. The quantileregressor 2082 may perform quantile-based confidence bound computationbased on the prediction residuals (x_(t)-{circumflex over (x)}_(t)) fromthe time series prediction model for actual past (training) data andactual future (test) data from the moderately sparse time seriesdatasets 202. Quantile regressor 2802 may make no assumptions about thedistribution of the prediction residuals from the time series predictionmodel 2081. Referring to FIG. 2A, Quantile regressor 2082 may generatethe prediction 212 with the first set of prediction values, a first setof adjusted residuals and a first set of accurate adjusted predictionintervals as an output of the prediction 210 based on accurate predictedmean values from the time series prediction model 2081.

FIG. 7 shows example plots based on outputs of the quantile regressor2082 for processing moderately sparse time series datasets related toprocess 400 in accordance with some embodiments disclosed herein. Asshown in FIG. 7, quantile regressor 2082 may generate the prediction 210with predicted mean values shown as blue for the time series predictionmodel 2081 and accurate adjusted prediction intervals 77 with tightconfidence bounds with blue shaded area. For example, the predictioninterval 77 is located only on one side of the predicted mean value attime step t_(s), which reflects the time series with Non-Gaussian noisefeature and a prediction accuracy of prediction intervals.

At 412, the processor 121 may generate a visualized explanation topresent the generated prediction 210 including and or based onprediction values and prediction intervals for respective data points ofthe first time series datasets 202.

FIG. 8 is a flowchart illustrating an example process 800 for processingextremely sparse time series datasets to build an explainable QuantileHurdle modeling system 125 to forecast extremely sparse time series inthe future in accordance with some embodiments disclosed herein.

FIG. 9 shows example plots generated based on outputs of the time seriesprediction model for processing extremely sparse time series datasets inaccordance with some embodiments disclosed herein.

At 802, server computer 120 may receive extremely sparse time seriesdatasets or a second set of time series datasets 204 from database 126.The processor 121 may perform operations to generate different groups oftime series datasets including actual past training datasets 91 andactual future test datasets 93 as shown in FIG. 9.

At 804, the processor 121 may train a time series prediction model 2081with actual training datasets 91 and actual test datasets 93.

At 806, the trained time series prediction model 2081 may be executed bythe processor 121 to apply to each respective subset of previous datapoints of the extremely sparse time series datasets to generate a secondset of predicted mean/median values (e.g., predicted mean of (training)data 92), a second set of prediction values, a second set of predictionintervals, and a second set of time series residuals. As illustrated inFIG. 9, the time series prediction model 2081 may also generatepredicted confidence bounds (training) 95 with a blue shared area andprediction intervals for respective datasets. For each data point of theactual test datasets 93, the time series prediction model 2081 maygenerate predicted mean values 94 (e.g., predicted mean (test) data inFIG. 9), predicted confidence bounds (test) 96, a wide upper confidencebound shown in blue area, the second set of prediction intervalscorresponding to the second set of time series residuals.

At 808, the processor 121 may train a quantile regressor 2082 to performquantile-based confidence bound computation based on a second set of thetime series residuals generated by the time series prediction model 2081along with a second set of model parameters c_(i) to predict theconfidence bounds for improving the prediction interval estimation.

At 810, the processor 121 may apply the quantile regressor 2082 to thesecond set of residuals generated by the time series prediction model2081 to generate a second set of prediction values and a second set ofadjusted residuals. The Quantile regressor 2082 may performquantile-based confidence bound computation based on the residuals(x_(t)-{circumflex over (x)}_(t)) from the time series prediction modelfor actual training datasets 91 and actual test datasets 93 from theextremely sparse time series datasets 204. Referring to FIG. 2A, theQuantile regressor 2082 may generate the prediction with a second set ofthe prediction values and a second set of accurate adjusted predictionintervals as an output of the outputs of Hurdle Regressor 208 based onaccurate predicted mean values.

FIG. 10 show example plots generated based on outputs of quantileregressor 2082 for processing extremely sparse time series datasets 204or a second set of datasets 202. As shown in FIG. 10, quantile regressor2082 may generate the prediction 214 with predicted mean values shown asblue for the time series prediction model 2081 and accurate predictionintervals with tight confidence bounds with blue shaded area. FIG. 10shows how the quantile hurdle model 2082 can improve the estimation ofprediction intervals for extremely sparse time series in comparison tothe wide prediction intervals from the time series prediction 2081 inFIG. 9.

At 812, referring to FIG. 2A, Hurdle classifier 206 may be executed bythe processor 121 to predict whether an event relating to the timeseries is likely to occur. Hurdle classifier 206 may be trained by theextreme sparse time series datasets or the second set of time seriesdatasets 204 to predict a probability of the event at a first timegranularity (e.g., daily). For example, based on the extremely sparsetime series datasets 204, Hurdle classifier 206 may be trained topredict the probability of the event for each data point at the timestep is daily or a first time granularity. Hurdle classifier 206 maygenerate a set of probabilities (p1, p2 . . . p7) for a sub-period oftime series datasets within a time period, such as a week or a timeperiod of a second time granularity.

At 814, for the sub-period of time series datasets, the processor 121may determine a period probability p(w) of events occurred in the timeperiod based on equation (2) described above.

At 816, the processor 121 may execute an algorithm of a probabilityfilter 212 to compare the period probability p(w) to a probabilitythreshold. The probability threshold may be unique to respective timeseries datasets. The threshold value may be different for each timeseries and adjusted based on historic time series data.

At 818, when the processor 121 determines that the period probabilityp(w) is equal or below the probability threshold, the processor may setthe prediction values or the output of Hurdle Regressor 208 to be 0 asthe prediction result 214 for the corresponding sub-period of timeseries datasets within the time period.

At 820, when the processor 121 determines that the period probabilityp(w) is above the probability threshold, the processor may confirm thesecond set of the prediction values and the second set of the adjustedprediction intervals from quantile regressor 2082 for the sub-period ofdatasets within the time period. The prediction result 214 may includethe second set of the prediction values and the second set of theadjusted prediction intervals on a weekly granularity. The predictionresult 214 may be aggregated to a suitable prediction granularity, suchas a monthly granularity.

At 822, the processor 121 may generate a visualized explanation topresent the generated prediction 214 including or based on the secondset of the prediction values and a second set of the adjusted predictionintervals for respective data points of the second time series datasets204.

FIG. 11 shows an example user interface presenting the predictionexplanations of the forecast result in accordance with some embodimentsdisclosed herein. A prediction explanation may be generated usingShapley values for presenting prediction results associated with eventsin the future such that related users may understand the forecastedresults of their historical activities and events. The predictionexplanation may include a plurality of temporal features associated withevents and time series of the events, and text explanation based on thegenerated prediction 210 or 214 including corresponding predictionvalues and prediction intervals.

Referring to FIG. 2A, without considering the sparsity level, theprocess 400 may be used to apply Hurdle Regressor 208 to a set of sparsetime series dataset to generate prediction 210. The process 800 may beused to apply Hurdle classifier 206 and Hurdle Regressor 208 to generateprediction 214. The processor 121 may compare the prediction accuracy ofthe prediction 210 and prediction 214 to determine whether to chooseprocess 400 or process 800 as a suitable process to generate a finalprediction.

In some embodiments, different time series prediction models oralgorithms may be selected for quantile hurdle modeling systems toperform sparse time series data prediction based on the modelperformance evaluation. A metric called Normalized Mean Absolute Error(NMAE) may be used to evaluate the improvement and compare the models orquantile hurdle modeling system performance among the different modelsand algorithms.

${NMAE} = \frac{{Mean}\mspace{14mu}{Absolute}\mspace{14mu}{Error}\mspace{14mu}{from}\mspace{14mu} a\mspace{14mu}{model}\mspace{14mu}({MAE})}{{MAE}\mspace{14mu}{of}\mspace{14mu}{the}\mspace{14mu}{trivial}\mspace{14mu}{predictor}}$

The MAE of trivial predictor refers to predictions determined using meanof historic data. This metric compares the performance of the algorithmto the mean prediction of the historic data. A model or quantile hurdlemodeling system may quantify a better performance with the lower NMAEvalue. If the NMAE value is larger than 1, the model may generateprediction with a lower accuracy performance compared with the mean ofhistoric data.

In some embodiments, the disclosed principles provide a practicaltechnological solution to effectively and accurately generate predictedtime series data. Embodiments of the present disclosure provideadvantages and improvements of processing moderately and extremelysparse time series datasets for predicting future values embedding. Forexample, the embodiments described herein provide computationalefficiency and predictive accuracy with related machine learning tasks.The advantages of the disclosed principles include providing accuracy inextremely sparse time series prediction. The disclosed methods mayassist users to process historical time series to predict correspondingevents that may occur in the future. The generated predictionexplanations may provide better service and or personalized servicetailored to associated users or entities.

The embodiments described herein may provide a real time solution withfaster processing and delivery of event predication that satisfy userexpectations and improve user experience when the users interact withthe system for managing event-related time series activities andobtaining related information and advice to manage their registeredaccounts with the online services.

FIG. 12 is a block diagram of an example computing device 1200 that maybe utilized to execute embodiments to implement processes includingvarious features and functional operations as described herein. Forexample, computing device 1200 may function as server computer 120, anduser computing device 130 or a portion or combination thereof. In someimplementations, the computing device 1200 may include one or moreprocessors 1202, one or more input devices 1204, one or more displaydevices or output devices 1206, one or more communication interfaces1208, and memory 1210. Each of these components may be coupled by bus1212, or in the case of distributed computer systems, one or more ofthese components may be located remotely and accessed via a network. Thecomputing device 1200 may be implemented on any electronic device toexecute software applications derived from program instructions storedin the memory 1210, and includes but not limited to personal computers,servers, smartphones, media players, electronic tablets, game consoles,email devices, etc.

Processor(s) 1202 may use any known processor technology, including butnot limited to graphics processors and multi-core processors. Suitableprocessors for the execution of a program of instructions may include,by way of example, both general and special purpose microprocessors, andthe sole processor or one of multiple processors or cores, of any kindof computer. Generally, a processor may receive instructions and datafrom a read-only memory or a random-access memory or both. The essentialelements of a computer may include a processor for executinginstructions and one or more memories for storing instructions and data.Generally, a computer may also include, or be operatively coupled tocommunicate with, one or more mass storage devices for storing datafiles; such devices include magnetic disks, such as internal hard disksand removable disks; magneto-optical disks; and optical disks. Storagedevices suitable for tangibly embodying computer program instructionsand data may include all forms of non-transitory memory, including byway of example semiconductor memory devices, such as EPROM, EEPROM, andflash memory devices; magnetic disks such as internal hard disks andremovable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks.The processor and the memory may be supplemented by, or incorporated in,ASICs (application-specific integrated circuits).

Input devices 1204 may be any known input devices technology, includingbut not limited to a keyboard (including a virtual keyboard), mouse,track ball, and touch-sensitive pad or display. To provide forinteraction with a user, the features and functional operationsdescribed in the disclosed embodiments may be implemented on a computerhaving a display device 1206 such as a CRT (cathode ray tube) or LCD(liquid crystal display) monitor for displaying information to the userand a keyboard and a pointing device such as a mouse or a trackball bywhich the user can provide input to the computer. Display device 1206may be any known display technology, including but not limited todisplay devices using Liquid Crystal Display (LCD) or Light EmittingDiode (LED) technology.

Communication interfaces 1208 may be configured to enable computingdevice 1200 to communicate with other another computing or networkdevice across a network, such as via a wired connection, a wirelessconnection, or a combination of wired and wireless connections. Forexample, communication interfaces 1208 may include an Ethernetinterface, an optical interface, a coaxial interface, an infraredinterface, a radio frequency (RF) interface, a universal serial bus(USB) interface, a Wi-Fi interface, a cellular network interface, or thelike.

Memory 1210 may be any computer-readable medium that participates inproviding computer program instructions and data to processor(s) 1202for execution, including without limitation, non-transitorycomputer-readable storage media (e.g., optical disks, magnetic disks,flash drives, etc.), or volatile media (e.g., SCRAM, ROM, etc.). Memory1210 may include various instructions for implementing an operatingsystem 1214 (e.g., Mac OS®, Windows®, Linux). The operating system maybe multi-user, multiprocessing, multitasking, multithreading, real-time,and the like. The operating system may perform basic tasks, includingbut not limited to: recognizing inputs from input devices 1204; sendingoutput to display device 1206; keeping track of files and directories onmemory 1210; controlling peripheral devices (e.g., disk drives,printers, etc.) which can be controlled directly or through an I/Ocontroller; and managing traffic on bus 1212. Bus 1212 may be any knowninternal or external bus technology, including but not limited to ISA,EISA, PCI, PCI Express, USB, Serial ATA or FireWire.

Network communications instructions 1216 may establish and maintainnetwork connections (e.g., software applications for implementingcommunication protocols, such as TCP/IP, HTTP, Ethernet, telephony,etc.). Application(s) 1220 and program modules 1218 may include softwareapplication(s) and different functional program modules which areexecuted by processor(s) 1202 to implement the processes describedherein and/or other processes. The program modules 1218 may include butare not limited to software programs, machine learning models, objects,components, data structures that are configured to perform tasks orimplement the processes described herein. The processes described hereinmay also be implemented in operating system 1214.

The features and functional operations described in the disclosedembodiments may be implemented in one or more computer programs that maybe executable on a programmable system including at least oneprogrammable processor coupled to receive data and instructions from,and to transmit data and instructions to, a data storage system, atleast one input device, and at least one output device. A computerprogram is a set of instructions that can be used, directly orindirectly, in a computer to perform a certain activity or bring about acertain result. A computer program may be written in any form ofprogramming language (e.g., Objective-C, Java), including compiled orinterpreted languages, and it may be deployed in any form, including asa stand-alone program or as a module, component, subroutine, or otherunit suitable for use in a computing environment.

The described features and functional operations described in thedisclosed embodiments may be implemented in a computer system thatincludes a back-end component, such as a data server, or that includes amiddleware component, such as an server computer or an Internet server,or that includes a front-end component, such as a user device having agraphical user interface or an Internet browser, or any combinationthereof. The components of the system may be connected by any form ormedium of digital data communication such as a communication network.Examples of communication networks include, e.g., a telephone network, aLAN, a WAN, and the computers and networks forming the Internet.

The computer system may include user computing devices and servercomputers. A user computing device and server may generally be remotefrom each other and may typically interact through a network. Therelationship of user computing devices and server computer may arise byvirtue of computer programs running on the respective computers andhaving a client-server relationship to each other.

Communication between various network and computing devices 1200 of acomputing system may be facilitated by one or more applicationprogramming interfaces (APIs). APIs of system may be proprietary and/ormay be examples available to those of ordinary skill in the art such asAmazon® Web Services (AWS) APIs or the like. The API may be implementedas one or more calls in program code that send or receive one or moreparameters through a parameter list or other structure based on a callconvention defined in an API specification document. One or morefeatures and functional operations described in the disclosedembodiments may be implemented using an API. An API may define one ormore parameters that are passed between an application and othersoftware instructions/code (e.g., an operating system, library routine,function) that provides a service, that provides data, or that performsan operation or a computation. A parameter may be a constant, a key, adata structure, an object, an object class, a variable, a data type, apointer, an array, a list, or another call.

While various embodiments have been described above, it should beunderstood that they have been presented by way of example and notlimitation. It will be apparent to persons skilled in the relevantart(s) that various changes in form and detail can be made thereinwithout departing from the spirit and scope. In fact, after reading theabove description, it will be apparent to one skilled in the relevantart(s) how to implement alternative embodiments. For example, othersteps may be provided, or steps may be eliminated, from the describedflows, and other components may be added to, or removed from, thedescribed systems. Accordingly, other implementations are within thescope of the following claims.

In addition, it should be understood that any figures which highlightthe functionality and advantages are presented for example purposesonly. The disclosed methodology and system are each sufficientlyflexible and configurable such that they may be utilized in ways otherthan that shown.

Although the term “at least one” may often be used in the specification,claims and drawings, the terms “a”, “an”, “the”, “said”, etc. alsosignify “at least one” or “the at least one” in the specification,claims and drawings.

Finally, it is the applicant's intent that only claims that include theexpress language “means for” or “step for” be interpreted under 35U.S.C. 112(f). Claims that do not expressly include the phrase “meansfor” or “step for” are not to be interpreted under 35 U.S.C. 112(f).

What is claimed is:
 1. A method implemented by a computing device forgenerating time series prediction, the computing device comprising aprocessor and a memory, the memory storing executable instructions thatwhen executed by the processor cause the computing device to performprocessing comprising: receiving, from a database in communication withthe processor, a plurality of time series datasets each corresponding toa data point indicative of a data value, each data point correspondingto each respective subset of previous data points at a first timegranularity within a time window; generating a first set of sparsedatasets having sparsity levels equal to or below a sparsity thresholdand a second set of sparse datasets having sparsity levels above thesparsity threshold; applying a time series forecasting model to eachrespective subset of previous data points of the first set of sparsedatasets increasingly at the first time granularity to generate a firstset of prediction values and a first set of residuals; applying aregression model to the first set of the prediction residuals togenerate a first set of adjusted residuals for the first set of sparsedatasets; and generate a visualized explanation based on the first setof the prediction values and the first set of adjusted residuals for oneor more of the first set of sparse datasets.
 2. The method of claim 1,wherein the processing further comprises calculating a percentage ofnonzero values of the time series dataset as each respective sparsitylevel of each respective time series dataset.
 3. The method of claim 1,wherein the processing further comprises: applying a time seriesforecasting model to each respective subset of previous data points ofthe second set of sparse datasets increasingly at the first timegranularity to generate a second set of prediction values and a secondset of residuals; and applying a regression model to the second set ofthe residuals to generate a second set of adjusted residuals for thesecond set of sparse datasets.
 4. The method of claim 3, wherein theprocessing further comprises: applying an ensemble classifier to thesecond set of the sparse datasets to predict a set of probabilities fora sub-period of the second sparse datasets at the first time granularitywith a period of a second time granularity, the second time granularitybeing multiple time steps of the first time granularity, the period ofthe second time granularity being one of a weekly time granularity or amonthly time granularity; applying a probability filter to the set ofprobabilities to determine a period probability corresponding to thesub-period of the second sparse datasets with the period of the secondtime granularity; determining whether the period probability is equal orbelow a probability threshold; responsive to determining the periodprobability being above a probability threshold, confirming theprediction values and the second set of the adjusted residuals for thesub-period of datasets within the time period; and generating avisualized explanation based on the second set of the prediction valuesand a second set of adjusted residuals for one or more of the first setof sparse datasets.
 5. The method of claim 4, wherein the processingfurther comprises: responsive to determining the period probabilitybeing equal to or below a probability threshold, setting zero as theprediction values for respective sub-period of datasets within the timeperiod.
 6. The method of claim 1, wherein each residual is indicative ofa difference between each respective data value and respectiveprediction value corresponding to each respective data point.
 7. Themethod of claim 1, wherein the visualized explanation comprises arespective prediction value embedded with texts and graphs presented inone or more temporal features.
 8. The method of claim 1, wherein thetime series forecasting model is trained with respective time seriesdatasets corresponding to respective sparsity levels of the time seriesdatasets.
 9. The method of claim 1, wherein the regression model is aquantile regression model is trained with a set of respective parametersof respective sparsity levels of the time series datasets.
 10. Themethod of claim 9, wherein a set of respective parameters comprise adata value, a set of quantile values, and a plurality of temporalfeatures comprising a date, day, week, month, day of the week, and weekof the month.
 11. A computing system, comprising: a server computingdevice comprising a processor and a memory; a database in communicationwith the processor and configured to store a plurality of time seriesdatasets, and a machine learning system comprising a time seriesforecasting model, a regression model and an ensemble classifier, themachine learning system including computer-executable instructionsstored in a memory and executed by the processor to cause the servercomputing device to perform processing comprising: receiving, from adatabase in communication with the processor, a plurality of time seriesdatasets each corresponding to a data point indicative of a data value,each data point corresponding to each respective subset of previous datapoints at a first time granularity within a time window; generating afirst set of sparse datasets having sparsity levels equals to or below asparsity threshold and a second set of sparse datasets having sparsitylevels above the sparsity threshold; applying a time series forecastingmodel to each respective subset of previous data points of the first setof sparse datasets increasingly at the first time granularity togenerate a first set of prediction values and a first set of residuals;applying a regression model to the first set of the prediction residualsto generate a first set of adjusted residuals for the first set ofsparse datasets; and generating a visualized explanation based on thefirst set of the prediction values and the first set of adjustedresiduals for one or more of the first set of sparse datasets.
 12. Thesystem of claim 11, wherein the processing further comprises calculatinga percentage of nonzero values of the time series dataset as eachrespective sparsity level of each respective time series dataset. 13.The system of claim 11, wherein the processing further comprises:applying a time series forecasting model to each respective subset ofprevious data points of the second set of sparse datasets increasinglyat the first time granularity to generate a second set of predictionvalues and a second set of residuals; and applying a regression model tothe second set of the residuals to generate a second set of adjustedresiduals for the second set of sparse datasets.
 14. The system of claim13, wherein the processing further comprises: applying an ensembleclassifier to the second set of the sparse datasets to predict a set ofprobabilities for a sub-period of the second sparse datasets at thefirst time granularity with a period of a second time granularity, thesecond time granularity being multiple time steps of the first timegranularity, the period of the second time granularity being one of aweekly time granularity or a monthly time granularity; applying aprobability filter to the set of probabilities to determine a periodprobability corresponding to the sub-period of the second sparsedatasets with the period of the second time granularity; determiningwhether the period probability is equal or below a probabilitythreshold; responsive to determining the period probability being abovea probability threshold, confirming the prediction values and the secondset of the adjusted residuals for the sub-period of datasets within thetime period; and generating a visualized explanation based on the secondset of the prediction values and a second set of adjusted residuals forone or more of the first set of sparse datasets.
 15. The system of claim14, wherein the processing further comprises: responsive to determiningthe period probability being equal to or below a probability threshold,setting zero as the prediction values for respective sub-period ofdatasets within the time period.
 16. The system of claim 11, whereineach residual is indicative of a difference between each respective datavalue and respective prediction value corresponding to each respectivedata point.
 17. The system of claim 11, wherein the visualizedexplanation comprises a respective prediction value embedded with textsand graphs presented in one or more temporal features.
 18. The system ofclaim 11, wherein the time series forecasting model is trained withrespective time series datasets corresponding to respective sparsitylevels of the time series datasets.
 19. The system of claim 11, whereinthe regression model is a quantile regression model which is trainedwith a set of respective parameters of respective sparsity levels of thetime series datasets.
 20. The system of claim 19, wherein a set ofrespective parameters comprise a data value, a set of quantile values,and a plurality of temporal features comprising a date, day, week,month, day of the week, and week of the month.